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Abstract 

The central result of classical game theory states that every finite normal form game has 
a Nash equilibrium, provided that players are allowed to use randomized (mixed) strategies. 
However, in practice, humans are known to be bad at generating random-like sequences, and 
true random bits may be unavailable. Even if the players have access to enough random bits 
for a single instance of the game their randomness might be insufficient if the game is played 
many times. 

In this work, we ask whether randomness is necessary for equilibria to exist in finitely re¬ 
peated games. We show that for a large class of games containing arbitrary two-player zero-sum 
games, approximate Nash equilibria of the n-stage repeated version of the game exist if and only 
if both players have P(n) random bits. In contrast, we show that there exists a class of games 
for which no equilibrium exists in pure strategies, yet the n-stage repeated version of the game 
has an exact Nash equilibrium in which each player uses only a constant number of random bits. 

When the players are assumed to be computationally bounded, if cryptographic pseudo¬ 
random generators (or, equivalently, one-way functions) exist, then the players can base their 
strategies on “random-like” sequences derived from only a small number of truly random bits. 
We show that, in contrast, in repeated two-player zero-sum games, if pseudorandom generators 
do not exist, then fl(n) random bits remain necessary for equilibria to exist. 
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1 Introduction 


The signature result of classical game theory states that a Nash equilibrium exists in every finite 
normal form game, provided that players are allowed to play randomized (mixed) strategies. It 
is easy to see in some games (e.g. Rock-Paper-Scissors) that randomization is necessary for the 
existence of Nash equilibrium. However, the assumption that players are able to randomize their 
strategies in an arbitrary manner is quite strong, as sources of true randomness may be unavailable 
and humans are known to be bad at generating random-like sequences. 

Motivated by these considerations, Budinich and Fortnow |BF11| investigated the question of 
whether Nash equilibria exist when players only have access to limited randomness. Specifically, 
they looked at the “repeated matching pennies.” Matching pennies is a very simple, two-player, 
two-action, zero-sum game in which the unique equilibrium is for each player to flip a fair coin and 
play an action uniformly at random. If the game is repeated for n stages, then the unique Nash 
equilibrium is for each player to play an independent, uniformly random action in each of the n 
stages. Budinich and Fortnow considered the case where the players only have access to <S n bits 
of randomness, which are insufficient to play the unique equilibrium of the game, and showed that 
there does not even exist an approximate equilibrium (where the approximation depends on the 
deficiency in randomness). That is, if the players cannot choose independent, uniformly random 
actions in each of the n stages, then no approximate equilibrium exists. 

In this work, we further investigate the need for randomness in repeated games by asking 
whether the same results hold for arbitrary games. That is, we start with an arbitrary multi-player 
game such that Nash equilibria only exist if players can use fd bits of randomness. Then we consider 
the n-stage repetition of that game. Do equilibria exist in the n-stage game if players only have 
access to fdn bits of randomness? First, we show that the answer is essentially no for arbitrary 
zero-sum games, significantly generalizing the results of Budinich and Fortnow. On the other hand, 
we show that the answer is yes for a large class of general games. 

These results hold when both players are assumed to be computationally unbounded. As noted 
by Budinich and Fortnow, if we assume that the players are required to run in polynomial time, 
and cryptographic pseudorandom generators (or, equivalently, one-way functions) exist, then a 
player equipped with only <C n truly random bits can generate n pseudorandom bits that appear 
truly random to a polynomial time adversary. Thus, in the computationally bounded regime, if 
pseudorandom generators exist, then linear randomness is not necessary. We show that, in contrast, 
in arbitrary repeated two-player zero-sum games, if pseudorandom generators do not exist, then 
linear randomness remains necessary. 

1.1 Our Results 

Suppose we have an arbitrary finite strategic game among k players. We consider the n-stage 
repetition of this game in which in each of the n consecutive stages, each of the k players simulta¬ 
neously chooses an action (which may depend on the history of the previous stages). We assume 
that in the 1-stage game /3 > 0 bits of randomness for each player are necessary and sufficient for an 
equilibrium to exist. We ask whether or not the existence of approximate equilibria in the n-stage 
game requires a linear amount of randomness (D(n) bits) per player. 

The case of computationally unbounded players. Our first set of results concerns players 
who are computationally unbounded, which is the standard model in classical game theory. In 
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this setting, our first result shows that linear randomness is necessary for a large class of games 
including every two-player zero-sum game. 

Theorem[T] (informal). For any k-player strategic game in which every Nash equilibrium achieves 
the minmax payoff profile, in any Nash equilibrium of its repeated version the players’ strategies 
use randomness at least linear in the number of stages. 

An important subset of strategic games where any Nash equilibrium achieves the minmax payoff 
profile is the class of two-player zero-sum games where, as implied by the von Neumann’s minmax 
theorem, the concept of Nash equilibrium collapses to the minmax solution. Hence, to play a Nash 
equilibrium in any finitely repeated two-player zero-sum game the players must use randomness at 
least linear in the number of stages. 

Second, we show that the above results cannot be extended to arbitrary games. That is, there 
exists a class of strategic games that, in their repeated version, admit “randomness efficient” Nash 
equilibria: 

Theorem [2] (informal). For any k-player strategic game in which for every player there exists a 
Nash equilibrium that achieves strictly higher expectation than the minmax strategy, there exists a 
Nash equilibrium of its repeated version where the players use total randomness independent of the 
number of stages. 

As we shall see, this result is related to the “finite horizon Nash folk theorem,” which roughly states 
that in finitely repeated games every payoff profile in the stage game that dominates the minmax 
payoff prohle can be achieved as a payoff profile of some Nash equilibrium of the repeated game. 

The case of computationally efficient players. For repeated two-player zero-sum games we 
study the existence of Nash equilibria with limited randomness when the players are computation¬ 
ally bounded. Under the assumption that one-way functions do not exist (see the above discussion), 
we show that it is possible to efficiently exploit any opponent (i.e., gain a non-negligible advantage 
over the value of the stage game) that uses low randomness in every repeated two-player zero-sum 
game. Hence, in repeated two-player zero-sum games there are no computational Nash equilibria 
in which one of the players uses randomness sub-linear in the number of the stages. 

Theorem [4] (informal). In any repeated two-player zero-sum game, if one-way functions do not 
exist, then for any strategy of the column player using sub-linear randomness, there is a computa¬ 
tionally efficient strategy for the row player that achieves an average payoff non-negligibly higher 
than his minmax payoff in the stage game. 

The proof of this result employs the algorithm of Naor and Rothblum |NR06] for learning adaptively 
changing distributions. The main idea is to adaptively reconstruct the small randomness used by the 
opponent in order to render his strategy effectively deterministic and then improve the expectation 
by playing the best response. 

Strong exploitation of low-randomness players. In the classical setting, i.e., without re¬ 
strictions on the computational power of the players, it was shown by Neyman and Okada [NOOO] 
that in every repeated two-player zero-sum game it is possible to extract utility proportional to 
the randomness deficiency of the opponent. On the other hand, our result in the setting with 
computationally efficient players guarantees only a non-negligible advantage in the presence of a 
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low-randomness opponent. This leaves open an intriguing question of how much utility can one 
efficiently extract from an opponent that uses low randomness in a repeated two-player zero-sum 
game (see Section [5] for additional discussion). 

The case of matching pennies. As noticed by Budinich and Fortnow |BF11| . the repeated 
game of matching pennies exhibits clear tradeoffs between the randomness available to players and 
existence of e-Nash equilibria. Our work generalizes their results already in the context of repeated 
matching pennies, since they assumed that the players randomize their strategies by flipping limited 
number of coins, whereas we only assume that the players’ strategies are of low entropy. Our results 
for the game of matching pennies are provided in Appendix iBl 

1.2 Other Related Work 

In one of the first works to consider the relation between the randomness available to players and the 
existence of equilibria Halpern and Pass [HP 14] introduced a computational framework of machine 
games that explicitly incorporates the cost of computation into the utility functions of the players 
and specifically the possibility of randomness being expensive. They demonstrated this approach on 
the game of Rock-Paper-Scissors, and showed that in machine games where randomization is costly 
then Nash equilibria do not necessarily exist. However, in machine games where randomization is 
free then Nash equilibria always exist. 

Based on derandomization techniques, Kalyanaraman and Umans |KU07| proposed randomness 
efficient algorithms both for finding equilibria and for playing strategic games. In the context of 
finitely repeated two-player zero-sum games where one of the players (referred to as the learner) is 
uninformed of the payoff matrix, they gave an adaptive on-line algorithm for the learner that can 
reuse randomness over the stages of the repeated game. 

Halprin and Naor [HNIO] suggested the possibility of using randomness generated by human 
players in repeated games for generation of pseudorandom sequences. The strategic game they 
proposed for this purpose is a zero-sum two-player game. As shown by our results, their choice 
improves the likelihood of extracting truly random bits from the gameplay, since the players must 
use linear randomness in the number of stages in equilibria of any repeated two-player zero-sum 
game. 

2 Notation and Background 

2.1 Game Theoretic Background 

Here we provide the concepts from game theory that we use in this work (for an in-depth study see 
the classical text by Osborne and Rubinstein |OR.94| b 

Definition 1 (strategic game). A strategic game G = (iV, (Aj), (ttj)) is a tuple consisting of 

• a finite set of players N 

• for each player i ^ N a nonempty set of actions A, 

• for each player i ^ N a utility function rtj : A —>■ M assigning each action profile a G A = 
Xj^^Aj a real-valued payoff Ui{a). 
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In the special case when G is a two-player zero-sum game we use the notation {{Ai, A 2 ),u) 
instead of ({1, 2}, (^i, ^42), (ui, M 2 )); since there are only two players and Mi(a) = —U 2 {a) for all 
a ^ Ai X A 2 . We refer to player 1 as the row player (also known as Rowena) and to player 2 as the 
column player (also known as Colin) 0 

We denote by Si the set of mixed strategies of player i, i.e., the set A{Ai) of all probability 
distributions on the action space of player i. For a strategy profile a G S = Xj^]\fSj we use ai to 
denote the strategy of player i in a and a-i to denote the profile of strategies of all the players in 
N except for player i in a, and we write a equivalently as {ai,a-i). 

Definition 2 (Nash equilibrium in strategic game). A Nash equilibrium of a strategic game 
{N, (Ai), (ui)) is a profile a of strategies with the property that for every player i G N we have 

E[M(fJi,o-_i)] > E[(<T',fT_i)] for all a'i G Si . 

Definition 3 (minmax payoff). The minmax payoff of player i in strategic game {N, (Ai), (ui)), 
denoted Vi, is the lowest payoff that the other players can force upon player i, i.e., 

Vi = min max E[Mj((Tj, u-i)] . 

(y—i^S—i (TiSSi 

A minmax strategy of player i in G is a strategy dj G Si such that E[Mj((fi, (T_i)] > Vi for all 
CT-i G S—i. 

Definition 4 (feasible and individually rational payoff profile). An individually rational payoff 
profile of G is a vector p G that weakly dominates the minmax payoff of every player, i.e., 
a vector for which pi > Vi for all i G N. A vector p G is a feasible payoff profile of G if 
there exists a collection {oafaeA of nonnegative rational numbers such that = 1 and 

Pi = YjaeA c^aufia) for all i G N. 

Note that since in every finite strategic game a Nash equilibrium always exists, there also always 
exists an individually rational payoff profile (the payoff profile of the Nash equilibrium). However, 
the Nash equilibrium payoff profile is not necessarily feasible in the above sense. 

Definition 5 (n-stage repeated game). Let G = {N, {Af), (ui)) be a strategic game. An n-stage 
repeated game of G is an extensive form game with perfect information and simultaneous moves 
G*^ = {N, H, P, (m* )) in which: 

. = {0}U{Ur=i^*}, where 0 is the initial history and A* is the set of sequences of action 

profiles in G of length t 

• P{h) = N for each non-terminal history h G H 

• u*{a^,..., aP) = ^ Ui{a^) for every terminal history (a^,..., aP) G A”. 

A behavioral strategy of player f is a collection {cri{h))fi^jj\j^n of independent probability measures 
(one for each non-terminal history), where each a fib) is a probability measure over A,. 

Definition 6 (Nash equilibrium in n-stage repeated game). A Nash equilibrium of an n-stage 
repeated game of G = {N, (Aj), (m^)) is a profile a of behavioral strategies with the property that 
for every player i G N and every behavioral strategy (t(, we have 

E[u*{ai,a-i)] >E[u*{ai,a-i)] . 

^We have adopted Colin and Rowena from Auinann and Hart [AHO,*!] . 
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2.2 Cryptographic Background 

Pseudorandom generators and one-way functions. The notion of cryptographic pseudoran¬ 
dom generators was introduced by Blum and Micali |BM84j . who defined them as algorithms that 
produce sequences of bits unpredictable in polynomial time, i.e., no efficient next-bit-test is able 
to predict the next output of the pseudorandom generator given the sequence of bits generated 
so far. As Yao |Yao82| showed, this is equivalent to a generator whose output is indistinguish¬ 
able from a truly random string to any polynomial time observer. One of the central questions 
in cryptography is to understand the assumptions that are sufficient and necessary for implement¬ 
ing a particular cryptographic task. Impagliazzo and Luby |IL89| (see also Impagliazzo |Imp92| ) 
showed that one-way functions are essential for many cryptographic primitives (e.g., private-key 
encryption, secure authentication, coin-flipping over telephone). Hastad, Impagliazzo, Levin and 
Luby [HILL99] showed that pseudorandom generators exist if and only if one-way functions exist. 
Therefore the existence of one-way functions is the major open problem of cryptography. For an 
in depth discussion see Goldreich [Goinij . 


Standard notation. A function /r : N —>■ M'*' is negligible if for all c € N there exists ric G N such 
that for all n > Uc, < n~^. A function /x ; N —>• K'*' is noticeable if there exists c G N and 
Uc G N such that for all n > Uc, > n“'^. 

Definition 7 (statistical distance). The statistical distance between two distributions X and Y 
over {0,1}^, denoted by SD(Y, Y), is defined as: 

SD(Y,Y) = i |Pr[Y = a]-Pr[y = a]| . 

qG{0,1}^ 

The most fundamental notion for measuring randomness is the Shannon entropy: 

Definition 8 (Shannon entropy). Given a probability distribution p G A(A), the Shannon entropy 
of p is defined as 

As mentioned above, if we have a one-way function then many cryptographic primitives are 
possible and in particular we can stretch a short seed into a long seemingly random one. Hence, 
we will be interested in the case that such functions do not exist. 


Definition 9 (almost one-way function). A function / is an almost one-way function if it is 
computable in polynomial time, and for infinitely many input lengths, for any PPTM A4, the 
probability that Ai inverts / on a random input is negligible. Namely, for any polynomial p, there 
exist infinitely many choices of n G N such that 


Pr 


[M{f{x)) G / ^(x)] < 
M 


1 

p(n) 
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3 Low-Entropy Nash Equilibria of Finitely Repeated Games 


In this section we show that, in the setting with players that have unbounded computational power, 
there are two classes of fc-player strategic games at the opposite sides of the spectrum with respect 
to the amount of randomness necessary for equilibria of their repeated versions. 

To measure the randomness of a player’s strategy we consider the maximal total Shannon 
entropy of his strategies used along any terminal history. 

Definition 10 (Shannon entropy of a strategy in repeated game). Let G = {N, {Ai), {ui)) be a 
finite strategic game and let Uj be a strategy of player i in the n-stage repeated game of G. For 
any terminal history a = (a^,...,o”) G A”, let ((Tj(0), (Jj(o^), a^),..., crj(a^,..., be 

the n-tuple of strategies of player i in ai at all the non-terminal subhistories of a. We define the 
Shannon entropy of ai, denoted as H{ai), as 

n-l 1 

+ YlH{ai{a^,... ,a^)) j- . 

This is a worst case notion, in that it measures the entropy of the strategy of player i irrespective 
of the strategies of the other players. For some of our results we consider its alternative variant of 
effective Shannon entropy of a strategy ai in a strategy profile a, i.e., the maximal total entropy of 
(7j along terminal histories that are sampled in a with non-zero probability. 

For the restricted class of games in which any Nash equilibrium payoff profile is exactly the 
minmax payoff profile (e.g. any two-player zero-sum game), the following proposition relates the 
Nash e^ilibria of the strategic game to the structure of Nash equilibria in its n-stage repeated 
version □ 

Proposition 1. Let G = {N, (Ai), (ui)) be a strategic game such that any Nash equilibrium payoff 
profile is equal to the minmax payoff profile. For all n € N, if a is a Nash equilibrium of G"' = 
{N,H,P, {u*)), the n-stage repeated game of G, then for every non-terminal history h ^ H sampled 
with non-zero probability by a the strategy profile cr(h) is a Nash equilibrium of G. 

Proof. Assume to the contrary that there exists a Nash equilibrium a of G” such that for some 
non-terminal history h ^ H, sampled with non-zero probability by cr, the strategy profile cr(h) is 
not a Nash equilibrium of G. Let h be without loss of generality the longest history such that (T{h) 
is not a Nash equilibrium of G. There exists a player i with a prohtable deviation a* in the stage 
game to his strategy in the strategy profile a{h). Consider the strategy cr' of player i in G” defined 
in the following way: <y[{h') = ai{h') for any history h' £ H that does not contain h as a subhistory, 
cr'(/i) = a* for the history h, and cr'iih") is the minmax strategy dj of player i in G for any history 
h" h that contains h as a subhistory. 

Note that for any history h' & H that does not contain h as a subhistory, E[ui((o'', cr_j)(/i'))] = 
E[ui((crj,cr_j)(/i'))] by the construction of cr'. Since the minmax strategy dj of player i guar¬ 
antees at least the minmax payoff vt (equal to any Nash equilibrium payoff of player i in G), 
E[uj((cr',cr_j)(/i"))] > E[uj((crj,cr_j)(/i"))] for any history h” h that contains h as a subhistory. 
Finally, E[ui(((T',cr_j)(/i))] > E[uj((crj,cr_j)(/i))] because a* is a prohtable deviation for player i in 
G given the strategy prohle a{h). 

^A variant of Proposition[T]with respect to pure equilibria is given in Osborne and Rubinstein |OR94| as Proposition 
155.1. 


Hiai) := max < H(ai 
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Recall that the history h is sampled in a with non-zero probability, and hence E[M*(crl, cr_i)] > 
'E[u*{ai,a-i)], i.e., the alternative strategy increases the expectation of player i in given that 
the other players follow a-i, a contradiction to a being a Nash eqnilibrinm of G”. □ 

For strategic games from this class, Proposition [1] immediately gives a linear lower bonnd on 
entropy needed to play Nash eqnilibria in their repeated games. 

Theorem 1. Let G be a strategic game such that any Nash equilibrium payoff profile is equal to 
the minmax payoff profile. For all n G N and every player i G N, if in any Nash equilibrium of G 
the strategy of player i is of entropy at least fii then in any Nash equilibrium of the n-stage repeated 
game of G the strategy of player i is of entropy at least nfii. 

Proof. Assnme to the contrary that there exists a Nash eqnilibrinm a of the n-stage repeated game 
of G with strategy of entropy strictly smaller than n • fii for player i. By Proposition [H cr{h) is a 
Nash eqnilibrinm of G for all h sampled by a with non-zero probability. Hence, there mnst exist a 
history h* G H sampled with non-zero probability in a snch that crfh*) is a Nash eqnilibrinm of G 
and the entropy H{ai{h*)) of crfib*) is strictly smaller than fii, a contradiction. □ 



Left (L) 

Heads {H) 

Tails (T) 

Right (R) 

Up (U) 

0,-1 

0,-1 

0,-1 

0, 

0 

Heads (H) 

0,-1 

1,-1 

-1, 1 

-1, 

0 

Tails (T) 

0,-1 

-1, 1 

1,-1 

-1, 

0 

Down (D) 

0, 0 

-1, 1 

-1, 1 

1, 

0 


Fignre 1: The payoff matrix of an extended game of matching pennies. 


Repeated non-zero-sum game requiring a lot of randomness. Theorem [T] applies not 
only to two-player zero-sum games but also to some non-zero-sum games. The game G given 
by the payoff matrix in Figure [T] is a variant of the game of matching pennies where the players 
have two additional options. There are three mixed Nash equilibria in G: {^H + ^T, + ^T), 

{^U + ^D, + ^R), and {^U + ^D, + ^R)', all the three Nash equilibria achieve the same 

payoff profile (0, 0) and require each player to use one random bit. Notice that the row player can 
get utility 0 irrespective of the strategy of the column player by selecting his action “Up”, and 
similarly the column player can ensure utility 0 by playing “Right”. Hence, the minmax payoff 
profile is (0,0). Since none of the three Nash equilibria of G improves over the minmax payoff 
profile, we get by Theorem [T] that each player must use strategy of entropy at least n in any Nash 
equilibrium of the n-stage repeated game of G. 

Repeated non-zero-sum game requiring low randomness. On the other hand, there are 
strategic games for which Theorem[T]does not apply, and the players may use in the n-stage repeated 
game equilibrium strategies of entropy proportional only to the entropy needed in the single-shot 
game. 
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Cooperate (C) Heads {H) Tails (T) Punish (P) 


Cooperate (C) 
Heads (H) 
Tails (T) 
Punish (P) 


3, 3 

-3, 6 

-3, 6 

1 

1 

CO 

CO 

1 

o' 

1,-1 

-1, 1 

1 

JCO 

1 

CO 

6,-3 

-1, 1 

1,-1 

1 

1 

CO 

-3,-3 

-3,-3 

-3,-3 

1 

1 


Figure 2: The payoff matrix of an extended game of matching pennies. 


Consider for example the strategic game G given by the payoff matrix in Figure [2j The strategy 
profile a = + ^T, + ^T) is the unique Nash equilibrium of G that achieves payoff profile 

(0,0). The minmax payoff profile is (—3, —3), since any player can get utility at least —3 by playing 
G. We show that the n-stage repeated game of G admits a Nash equilibrium that requires only a 
single random coin, i.e., the same amount of randomness as the Nash equilibrium a of the stage 
game G. Consider the strategy profile in which both players play G in the first n — 1 rounds and in 
the last round each player plays H and T with equal probability, and if any player deviates from 
playing G in one of the first n — 1 rounds then the opponent plays P throughout all the remaining 
stages. To see that this strategy profile is a Nash equilibrium of the n-stage repeated game of G 
note that any deviation from playing C in the first n — 1 rounds can increase the utility of any 
player by at most 3 (by playing either H oi T instead of G), however the subsequent punishment 
induces a loss of at least —3 which renders any deviation unprofitable. 

The randomness efficient Nash equilibrium from the above example resembles the structure of 
Nash equilibria constructed in the proof of the Nash folk theorem for finitely repeated games. This 
theorem characterizes the payoff profiles that can be achieved by Nash equilibria of the repeated 
game. In particular, it shows that in strategic games G such that for very player i there exists a 
Nash equilibrium ai strictly improving over his minmax payoff any feasible payoff profile (i.e., any 
convex combination of payoff profiles in G with rational coefficients) that is individually rational 
(i.e., achieves at least the minmax level for every player) can be approximated by a Nash equilibrium 
of sufficiently long finitely repeated game of G (cf. Osborne and Rubinstein [OR94] for a survey of 
known folk theorems). 

The main idea behind the proof of the folk theorem is that for every player i the gap between 
the payoff in the Nash equilibrium Uj and the minmax payoff Vi can be used to punish the player in 
case he deviates from the strategy that approximates any feasible and individually rational payoff 
profile. In particular, in any such Nash equilibrium the players use a fixed number of rounds 
(independent of the number of stages n) before the last round in which they play according to some 
(possibly mixed) Nash equilibria of the stage game and in the preceding rounds they play pure 
strategies so that the overall payoff approximates the feasible payoff profile. Hence, the amount of 
randomness on all the equilibrium paths is independent of the number of stages in any such Nash 
equilibrium of the repeated game. 

Theorem 2. Let G be a strategic game such that for every player i there exists a Nash equilibrium 
Ui of G in which the payoff of player i exceeds his minmax payoff Vi and there exists a feasible 
and individually rational payoff profile in G. Let fi be such that in any Nash equilibrium of G the 
strategy of player i is of entropy at most /3j. There exists c G N such that for all sufficiently large 











n G N and every player i ^ N there exists a Nash equilibrium of G"', the n-stage repeated game of 
G, in which the strategy of player i is of effective entropy at most c- (3i. 

Proof. Let p G be the feasible and individually rational payoff profile of G. There exist 

coefficients {oalaeA C Q such that ^ and for all i € N, pi = aaUi{a). Let K be 

the smallest integer such that each aa can be written as ol^jK for ol^ G N. For some £ G N, we 
divide the stages in G” into two parts of length l-K and m = n-i-K. Let s be a strategy profile in 
G” that schedules the first i ■ K stages such that each action profile a for which eta 7 ^ 0 is played by 
the players in exactly i ■ number of stages. In the remaining m stages the players cycle between 
the Nash equilibria {cri}i^N, i-e., for all j G {0,... , m — 1} at the stage n — m + 1 + j the players 
play the Nash equilibrium fjj/, where j' = 1 + {j mod \N\). In case any player i deviates from s in 
one of the first i ■ K rounds, the remaining players play the strategy that forces the minmax level 
Vi on player i. 

Note that if the number m of the last stages is such that for all action profiles a (z A with 
ckq 7 ^ 0 and for every player i: 


m 

M 






> max a_j) 

<eAi 


Ui{a) , 


then no player has a profitable deviation and cr is a Nash equilibrium of G”. The number m of last 
stages can be bounded by some constant c selected independently of n. Since the number of stages 
in which the players play according to some Nash equilibrium of G is at most c (the players take 
pure actions in all the first n — c stages), for any player i the effective entropy of Si in s is at most 

c • A- □ 


Randomness in Subgame Perfect Equilibria of Finitely Repeated Games. An unavoid¬ 
able shortcoming of the solution concept of Nash equilibrium in the context of repeated (and in 
general extensive form) games is that it is possible for equilibria to be established based on non- 
credible threats. This issue can be circumvented by the stronger requirement of subgame perfection 
that demands the players’ strategies to be best response at every history (even off the equilibrium 
path), and hence implicitly eliminates all empty threats. 

Since any subgame perfect equilibrium is a Nash equilibrium, the linear lower bound on the 
amount of entropy applies to subgame perfect equilibria when the minmax payoff profile cannot 
be improved upon by any Nash equilibrium in the stage game. On the other hand, it is possible 
to construct a randomness efficient subgame perfect equilibrium in the n-stage repeated game if 
in the underlying game there are two Nash equilibria with different payoffs for each player. Such 
subgame perfect equilibrium is constructed in the proof of perfect finite horizon Folk theorem of 
Benoit and Krishna |BK85| . 

Characterization of games with randomness efficient eqnilibria. The condition on the 
structure of the stage game in Theorem [2] (i.e., that for every player there exists a Nash equilibrium 
of the stage game that strictly improves over his minmax payoff) is the same as in the Nash Folk 
theorem of Benoit and Krishna |BK87| . We leave it as an open problem whether ideas from a proof 
of a more general finite horizon Nash folk theorem (e.g. the one given by Gonzalez-Dfaz [GonObj ) 
could help extend (or characterize) the class of games that admit randomness efficient equilibria in 
their repeated versions. 
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4 Low-Entropy Computational Nash Equilibria of Finitely Re¬ 
peated Two-Player Zero-Sum Games 

In this section we study randomness in equilibria of repeated two-player zero-sum games with 
computationally efficient players. The solution concept we consider in this setting is computational 
Nash equilibrium (introduced in the work of Dodis, Halevi and Rabin |DHRnO| ) that assumes 
that the players are restricted to computationally efficient strategies and indifferent to negligible 
improvements in their utilities, i.e., a computational Nash equilibrium is analogous to the concept 
of e-Nash equilibrium with a negligible e, where the player’s strategies, as well as any deviations, 
must be computationally efficient. 

To capture the requirement of computational efficiency, the players’ strategies must be imple¬ 
mented by families of polynomial-size circuits. For a two-player zero-sum game G, we denote by 
repeated game of G the infinite collection of all the n-stage repeated games of G. A fam¬ 

ily of polynomial size circuits {G^InGN implements the strategy of the row player in the repeated 
game of G as follows. In G^, the n-stage repeated game of G, the circuit Gn takes as input a string 
corresponding to a non-terminal history h in G” and s{n) random bits; it outputs an action to be 
taken at history h. If the strategy of player i G {1,2} is implemented by family (G^jneN then the 
gameplay in the n-stage repeated game of G is defined in the following way: player i samples a 
random string r* G {0, and at each stage of G"" takes the action a = G^{h,ri) G Aj, given 

that the history of play up to the current stage is h. The utility function n* is for all n defined 
as in the standard n-stage repeated game of G (i.e., it is the average utility achieved in the stage 
game over the n stages). 

Definition 11 (computational Nash equilibrium of repeated game). For a two-player zero-sum 
game G = ((Ai,A 2 ),n), a computational Nash equilibrium of the repeated game of G is a strategy 
profile ({G^jngN, (C'nlrieN) given by polynomial-size circuit families such that for every player 
i G {1,2} and every strategy {G)j}„gN given by a polynomial-size circuit family it holds for all large 
enough n G N that 

e[<(g;,g-‘)] > e[<(g;,g-*)] + e(n) , 

where e is a negligible function. 

We show that if one-way functions do not exist, then in repeated two-player zero-sum games 
there are no computational Nash equilibria in which the players’ strategies use random strings of 
length sub-linear in the number of the stages. 

Our result follows by showing that finding efficiently a best response to the opponent’s strategy 
that uses limited randomness can be seen as a special case of the problem of learning an adaptively 
changing distribution (introduced by Naor and Rothblum |NR06| 1. The goal in their framework is 
for a learner to recover a secret state used to sample a publicly observable distribution, in order to 
be able to predict the next sample. In particular, this would allow the learner to be competitive 
to someone who knows the secret state (Naor and Rothblum |NR.n6| considered this problem in 
the context of an adversary trying to impersonate someone in an authentication protocol). In the 
setting of repeated games, the random string used by the opponent’s strategy can be thought of 
as the secret state. Note that learning it at any non-terminal history would give rise to efficient 
profitable deviation, since the player could just compute the next move of his opponent and play 
the best response to it. 
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Learning adaptively changing distributions. An adaptively changing distribution is given 
by a pair of algorithms Q and V for generating an initial state and sampling. The algorithm Q 
is a randomized function Q : R ^ Sp x Sinit that outputs an initial public state po and a secret 
state So- The sampling algorithm P is a randomized function R : Sp x Sg x R ^ Sp x Sg that 
at each stage takes the current public and secret states, updates its secret sate and outputs a 
new public state. A learning algorithm C for (^, V) is given the initial public state po does 
not get the initial secret state sq) and at each round i: i) £ either outputs prediction of the 
conditional distribution ... ,pi} of the public output of V given the initial secret sq and 

the observed public states po,..., Pi, or ii) £ proceeds to round i + 1 after observing a new public 
state pi+i ^ £)^^^(po ,... ,pi). The goal of the learning algorithm is to output a hypothesis (in a 
form of a distribution) that is with high probability close in statistical distance to D^^^{pq, ... ,pi). 
In other words, £ is trying to be competitive to somebody who knows the initial secret state sq- la 
the setting where Q, V are efficiently constructible Naor and Rothblum [NE,n6| gave an algorithm 
£ that learns sq ia probabilistic polynomial time provided that one-way functions do not exist. 
Moreover, their algorithm outputs a hypothesis after seeing a number of samples proportional to 
the entropy of the initial secret state. 

Theorem 3 (Naor and Rothblum |NR06| I. Almost one-way functions exist if and only if there 
exists an adaptively changing distribution {Q.,T>) and polynomials e(n),(5(e) such that it is hard 
to {6{n), e{n))-learn the adaptively changing distribution {G,R) with O ((5“^(n) • e“^(n) • log \Sinit\) 
samples. 

The strategy of the column player (Colin) with limited randomness gives rise to a natural 
adaptively changing distribution and we show that the algorithm of Naor and Rothblum [NROBj 
can be used to construct a computationally efficient strategy for the row player (Rowena) that 
achieves utility noticeably larger than the value of the stage game. Hence, if one-way functions do 
not exist, then in repeated two-player strategic games there are no computational Nash equilibria 
with strategies that use sub-linear randomness in the number of the stages. 

Theorem 4. Let G = ((Ai, A 2 ),u) he a two-player zero-sum strategic game with no weakly domi¬ 
nant pure strategies and with value v. If almost one-way functions do not exist then for any strategy 
{C'njnGN of Colin in the repeated game of G that uses o{n) random hits, there exists a polynomial 
time strategy of Rowena with expected average utility v-\-5{n) against for some noticeable 

function 6. 

Proof. Let {C^InGN be an arbitrary strategy of Colin that takes s{n) € o(n) random bits. Let p 
be the minmax strategy of Rowena in G. We define the following adaptively changing distribution 
(Q, V). The generating algorithm Q on input I"" outputs a random string of length s(n) as the initial 
secret state sq and the initial history 0 of the n-stage repeated game of G as the initial public state 
Pq. The sampling algorithm V outputs the new secret state Sj+i identical to the secret state s* that 
it received as an input (i.e., the secret state remains fixed as the s(n) random coins sq) and updates 
the input public state pi in the following way. The sampling algorithm parses pi as a history of 
length i in the n-stage repeated game of G and computes Colin’s action c, = Gn{pi, Si) at pi using 
randomness Sj. T> additionally samples Rowena’s action ri p according to her minmax strategy 
and then outputs the history {pi, {ri,Ci)) of length i -|- 1 as the new public state Pi+i. Note that 
after sampling the initial secret state sq the only randomness used by V is to sample the minmax 
strategy of Rowena. 
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It follows from Theorem [3] that there exists an efficient learning algorithm L that after at 
most k = k{n) G 0 (s(n) • 6~'^{n)e~^{n)) samples from D outputs a hypothesis h such that 
Pr[SD(Z)^^^, < e(n)] > 1 — 6{n). Consider the strategy of Rowena that uses £ in order 

to learn Colin’s random coins. In particular, a strategy that at each stage i runs C on the current 
history and if £ outputs some hypothesis h then the strategy plays the best response to Colin’s 
action at stage i sampled according to otherwise it plays according to Rowena’s minmax 

strategy //. This strategy can be efficiently implemented and it achieves ex^ctation at least v in 
the re — 1 stages in which Rowena plays according to her minmax strategyo It remains to show 
that Rowena has a noticeable advantage over the value of the game at the stage in which £ outputs 
the hypothesis h about sq and Rowena selects her strategy as the best response to Colin’s action 
sampled according to . 

First, note that since G has no weakly dominant strategies, the best response to any pure action 
02 of Colin achieves a positive advantage over the value of the game. This observation follows from 
the fact that Rowena’s minmax strategy achieves expectation at least v against any action of Colin 
and from the fact that the minmax strategy must be mixed (as there are no weakly dominant 
strategies). By moving all the probability in the minmax strategy to the action with highest payoff 
given that Colin plays 02 , Rowena achieves a value strictly larger than v. Hence, there exists 
some constant e (depending only on G) such that if is e-close in statistical distance to 

then the expectation of the best response against achieves expectation at least v + c for 

some constant c > 0. Moreover, it is good enough if £ outputs such h with probability at least 
1 — 5 for some constant 5 > 0. Since e and 6 can be constant, for all large enough re the learning 
algorithm £ outputs the hypothesis after receiving at most k < n samples which allows Rowena to 
get expectation at least v + ^c. □ 

It follows from Theorem 0] that if one-way functions do not exist, then there is no computational 
Nash equilibrium of repeated two-player zero-sum games where one of the players uses random 
strings of length sub-linear in the number of stages. 

Corollary 1. Let G = ((Hi,H 2 ),re) be a two-player zero-sum strategic game with no weakly dom¬ 
inant pure strategies and with value v. If almost one-way functions do not exist then there is no 
computational Nash equilibrium of the repeated game of G in which strategy of one of the players 
uses o{n) random hits. 

Proof. Assume that there exists a computational Nash equilibrium {C'nInGN) of {G'"'}nGN, 

the repeated game of G, in which the strategy of one of the players uses random strings of length 
o(re). Without loss of generality, let Cohn be the player with strategy that uses sub-linear random¬ 
ness in the number of stages. 

Denote by w{n) the expectation of Rowena in this computational Nash equilibrium, i.e., for 
all re € N, w{n) = E[re* (£*,(, C^)]. First, consider the case when w{n) < re -|- r/(re) for some 
negligible function 77 . By Theorem0]there exists a polynomial-time strategy of Rowena that achieves 
expectation re-|-5(re) against {C^jneN for some noticeable function 6. Thus, this strategy constitutes 
Rowena’s computationally efficient deviation to the above strategy profile that is profitable by some 
non-negligible amount. Second, consider the case when rre(re) = re-|-5(re) for some noticeable function 
6. Colin can efficiently approximate the strategy that at each stage achieves his minmax payoff 

®Note that if jC does not output a hypothesis at the current stage, then Rowena chooses her action according to 
the same distribution as in her minmax strategy, and her expectation is v. 


12 



profile in the stage game to achieve expected payoff in the repeated game at least —v — rj{n), where 
r/ is a negligible function. Such strategy constitutes Colin’s computationally efficient deviation that 
achieves non-negligible advantage over the above utility profile. In both cases, {Cnl^GN) 

is not a computational Nash equilibrium of the repeated game of G. □ 

5 Strong Exploitation of Low-Entropy Opponents 

We showed in the previous sections that equilibrium strategies in repeated two-player zero-sum 
games (both with or without restrictions on the computational power of the players) require entropy 
at least linear in the number of stages. A natural approach for enabling equilibria that require lower 
amount of randomness might be to relax the solution concept and consider e-Nash equilibria, i.e., 
to ask what is the amount of randomness necessary for equilibrium strategies when the players are 
indifferent to improvements in utility smaller than e. 

As can be seen from the following argument, an equivalent question is how much can a player 
exploit an opponent that uses a strategy of low-entropy. Let a be an entropy level such that Rowena 
can exploit any Colin’s strategy of entropy below a by more than e (i.e., she can achieve expected 
utility in the repeated game improving by at least e over the value of the stage game). Then in 
any e-Nash equilibrium of the repeated game the strategy of the column player must be of entropy 
at least a. 

5.1 Computationally Unbounded Players 

The performance of strategies with bounded entropy in repeated two-player zero-sum games was 
previously studied in the standard setting with players that do not face any computational limita¬ 
tions. Towards this direction, Neyman and Okada |N099| introduced a notion of strategic entropy 
in the context of repeated two-player zero-sum games in order to analyze repeated games played 
by bounded automata or players with bounded recall. Subsequently, [NODOj gave an asymptotic 
characterization of the value of repeated two-player zero-sum games when one of the players is 
restricted to strategies of bounded strategic entropy. In particular, they showed that if the row 
player can use strategies of strategic entropy at most yn, then in the n-stage game she can guar¬ 
antee expected average utility at most (cavf 7 )( 7 ); where U{'y) is the maximal expected utility the 
row player can guarantee in the stage game by a strategy of entropy at most 7 , and cav U is the 
concavification of U (i.e., the smallest concave function larger or equal to U for all 7 > 0). 

Repeated matching pennies. For the special case of the repeated game of matching pennies 
(given in Figure[3]), Budinich and Fortnow |BF11] noticed a smooth tradeoff between the amount of 
entropy available to players and the necessary relaxation of the Nash equilibrium solution concept. 
In particular, they showed that in any e-Nash equilibrium of the n-stage repeated game of matching 
pennies the players must use strategies of entropy at least (1 — e)n (for all 0 < e < 1). Their result 
follows by observing that in the n-stage game of matching pennies for all 0 < e < 1 , the best response 
of the column player to any strategy of the row player of entropy at most (1 — e)n achieves expected 
utility at least e. This observation can be derived from the result of Neyman and Okada [NOnO| by 
noticing that in the one-shot game of matching pennies (cav 17) (1 — e) = —e. Hence, in the n-stage 
game of matching pennies the row player can guarantee for herself average expected utility at most 
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(cav [/)(! — e) = —e by a strategy of entropy at most (1 — e)n, and equivalently the column player 
can achieve expectation at least e. 

In fact, the result of Neyman and Okada [NOOO] implies that the relation between e-Nash 
equilibria and the entropy of the players’ strategies can be extended to all repeated two-player 
zero-sum games. 

Theorem 5. Let G = ((Ai, ^ 2 ), rt) he a two-player zero-sum strategic game of value v and let f3 > 0 
denote the minimal entropy of a minmax strategy for the column player in G. For any 0 < e < 1, 
there exists c > 0 such that if a is a strategy of the column player of entropy (1 — e)j3n in the 
n-stage repeated game of G then the row player has a deterministic strategy that achieves average 
payoff of at least v + c against a. 

For completeness we provide the proof of Theorem [5] in Appendix lAl 

Limits on exploiting a low-entropy opponent in non-zero-sum games. In repeated non¬ 
zero-sum games, unlike in repeated two-player zero-sum games, it is in general not possible for 
a player to always achieve utility strictly above his minmax level given that his opponent uses 
low-entropy strategy. We illustrate this phenomenon on the game G given by the payoff matrix in 
Figured] that we discussed in Section [3| Note that if Colin plays his pure action “left” then Rowena 
gets utility 0, her minmax payoff, irrespective of her strategy. Even though Colin needs at least 
one random bit to play his equilibrium strategy in G, Rowena cannot beneht from the imperfect 
play of her opponent at all. Note that this limitation occurs even if any strategy of Colin in a Nash 
equilibrium of the repeated game of G must use randomness linear in the number of stages. 

5.2 Computationally Efficient Players 

Our results from Section 0] (i.e.. Theorem dj) show that if one-way functions do not exist, then it is 
possible to efficiently gain a noticeable advantage over an opponent that uses randomness sub-linear 
in the number of the stages. We hnd it as an intriguing open problem to show a stronger version of 
Theorem [4| analogous to know results in the setting with computationally unbounded players (i.e.. 
Theorem [5]). In particular, to show that it is possible to efficiently gain a constant advantage over 
an opponent that uses randomness sub-linear in the number of the stages (even for the special case 
of the repeated game of matching pennies). 
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A Exploiting Low Entropy in Two-Player Zero-Sum Games 


In this appendix we provide the proof of Theorem [5] that establishes that if one player uses a 
constant fraction less randomness in the repeated two-player zero-sum game, then the other player 
can obtain an average payoff that is larger than the value of the stage game by a constant. 

We use the following lemma about performance of low-entropy strategies in two-player zero-sum 
games in the proof of Theorem [5l 

Lemma 1. Let G = ((^ 1 ,^ 2 ),^*) be a two-player zero-sum strategic game of value v and let /3 > 0 
denote the minimal entropy of a minmax strategy for the column player in G. For every e > 0, 
there exists > 0 such that if a is a strategy of the column player of entropy (1 — e)/3 then the row 
player has a strategy that achieves utility at least v + c^ against a. 

Proof. Let a be an arbitrary strategy of Colin in G of entropy (1 — e) • /3 for some e > 0, and 
let Pa denote the best response strategy of Rowena to a. First, we show that Rowena’s expected 
utility E[u(Po-,o')] is at least v -\- c for some c > 0. Suppose to the contrary that Rowena’s best 
response to a achieves expectation at most v. Let p be the minmax strategy of Rowena in G, the 
profile [p, a) is a Nash equilibrium of G: Rowena’s minmax strategy guarantees at least the value 
of the game v. On the other hand, by the hypothesis her best response to a achieves at most v, 
so Rowena’s expectation in (p, a) is equal to v. There are no profitable deviations for Colin, since 
he cannot decrease Rowena’s expectation below v given that she plays according to her minmax 
strategy. The strategy a of Colin is of entropy {1 — e) ■ P < ft, and the strategy profile (/5, cr) is a 
Nash equilibrium of G contradicting that /3 is the minimal entropy of Colin’s strategy in any Nash 
equilibrium of G. Hence, the best response to a must increase Rowena’s expectation by a non-zero 
amount over v. The statement of the lemma follows by setting to be the infimum of the set of 
all c achieved against Colin’s strategies of entropy {1 — e) ■ ft. □ 

Theorem [5l Let G = {{Ai, A 2 ),u) be a two-player zero-sum strategic game of value v and let 
/3 > 0 denote the minimal entropy of a minmax strategy for the column player in G. For any 
0 < e < 1, there exists c > 0 such that if a is a strategy of the column player of entropy (1 — s)ftn 
in the n-stage repeated game of G then the row player has a deterministic strategy that achieves 
average payoff of at least v -\- c against a. 

Proof. Let a be an arbitrary strategy of the column player (Colin) of Shannon entropy n-/3(l—e) for 
some e G [0,1]. Let pa be the strategy of the row player (Rowena) that at each non-terminal history 
a plays the best response in G to Colin’s strategy < 7 ( 0 ). Rowena’s expectation Ea<-(p^,(T)['«*(«)] is 

-( E [u(a')]+ E [uia^)\a^] + ■ ■ ■ + E Ka")|(a\ ..., a"”')] 

n \a<^{pa,cr) a<—{prj,a-) a-f-{pa,a) 

By the definition of conditional expectation, we rewrite her expectation as a summation over all 
terminal histories, i.e., 

Y,iPa,c7){b)-( E [u(a')]+ E [u{a^)\a^ = b^] + --- 

\a<-(p<T,o-) a<-(pa,o-) 

+ E Ka-)|(a\...,a-i) = (6\...,6"-i)]n . (1) 

a-(-(pa,o-) / / 
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Note that for every terminal history b £ the summands correspond to the expectation of 
Rowena at the non-terminal subhistories of b. For any terminal history b G A^, the total sum of 
entropy used in a at the subhistories of b is at most (1 — e)(5n, which implies that there are at 
least n' = n (l — (1 — e)/(l — |)) subhistories of b where the Cohn’s strategy has entropy at most 
(1 — |)/3. To see this assume that there exists a terminal history b with less than n' subhistories 
where cr uses entropy at most (1 — |)/3. Then the total entropy of a on all subhistories of b is 
strictly larger than 

(n - n') (l - I) /3 = - n (^1 - ^ ^ (^1 - /3 = (1 - e)/3n , 

a contradiction. As shown in Lemma [H for each subhistory of b where Colin uses strategy of 
entropy at most (1 — |)/3, Rowena’s best response achieves at least v -\- c, where c = Cj /2 > 0 is a 
value determined by the game G (and a function of epsilon). On all other subhistories of b (with 
Colin’s strategy of entropy larger than (1 — |),8) the value of Rowena is at least v. Therefore the 
total utility (the sum of the expectations over all subhistories of b) is at least nv + c-n' = n{v + c'), 
where c' = c (l — (1 — e)/(l — §)) > 0 . 

Since this holds for every terminal history of G”, it follows from ([ 1 ]) that the strategy of 
Rowena achieves average expected utility at least v -\- c' against cr in G". □ 

Note that the constant c by which the row player can exploit strategy of the column player of 
entropy (1 — e)/3n is related to the possible gain of the row player in the stage game, given that 
the column player plays strategy of entropy (1 — e)/3. To make the connection explicit, we use the 
following notation from Neyman and Okada [NOOO] . Let G = ((Ai,^ 2 ),^) be the stage game and 
for 7 > 0 define 

17 ( 7 )= max min E[u(cr,a 2 )]. 

(TEA(Ai) CL2^^2 
H[cr)<'y 

Hence, U ( 7 ) is the maximal expected utility the row player can guarantee with a strategy of entropy 
at most 7 ; or equivalently, —U{'y) is the minimal expected utility that the column player can achieve 
by a best response to any strategy of the row player of entropy at most 7 . Note that 17(0) is equal 
to the row player’s minmax level in pure strategies, and for all 7 > 0, U{'y) is at most the value of 
the game. Using this notation the statement of Theorem [5] can be restated as: 

Theorem!^ (restated). Let G = {{Ai, A 2 ),u) he a two-player zero-sum strategic game of value 
V and let (5 > 0 denote the minimal entropy of a minmax strategy for the row player in G. For any 
0 < e < 1 , if a is a strategy of the row player of entropy (1 — £)fdn in the n-stage repeated game 
of G then the column player has a deterministic strategy that achieves average payoff of at least 
—V — ^1 — 'j G((l — |)/?) against a. 

We remark that an improved bound on the expectation can be obtained using the technique 
of Neyman and Okada and the column player can in fact achieve average expected utility at least 
—V — (cav 17)((1 — e)/3), where cav U is the smallest concave function larger or equal than U. 

Theorem [ 6 ] below can be seen as a “converse” of Theorem [5l Specifically, we show that even if 
the players are restricted to strategies of entropy (1 — e)/3n then there exists an e'-Nash equilibrium 
of G^ for some e' proportional to e. 
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Theorem 6. Let G be a two-player zero-sum strategic game such that the minimal entropy of a 
minmax strategy is j3 > 0 for both players. There exists c > 0 such that for all 0 < e < 1 and for 
all n, there exists a -Nash equilibrium of the n-stage repeated game of G in which the 

players’ strategies are of entropy at most (1 — e)/3n. 

Proof. Let a be the strategy profile in the n-stage repeated game of G in which the players play in 
the first [n(l — e)J stages according to their minmax strategies of minimal entropy (i.e., entropy /3), 
and in the remaining [ne] stages the players alternate between playing the (pure) action profiles 
a* ^ Ai X A 2 and x A 2 , such that p* = u(a*) is the maximum payoff of Rowena in G and 

p^ = u(a^) is the minimal payoff of Rowena in G. Note that by construction of a, the players use 
strategies of entropy at most n ■ /3(1 — e). 

Assume [ne] is odd (the argument for \ne \ even is analogous). The expected utility of Rowena 
in a in the n-stage repeated game of G is 

EK(a)] = 1 (^[n(l - £)J • u + i(Kl - IW +P^)+P*^ , 

where v is the value of G. The expectation of every deviating strategy cr^ of Colin is 

-E[n*(cJi,cj 2 )] < ^ (^-[^(1 -e)J ■v + ]^{\ne\ - l)(-p^ - p'^) - , 

hence Colin can increase his utility by at most ^{p* — p^){\ne~\ -|- 1). Similarly, the increase in 
expectation from any deviating strategy of Rowena can be upper bounded by ^{p* —p^){\ne] — 1). 

Therefore, u is a • ^”'^J'*~^ ^-Nash equilibrium of the n-stage repeated game of G for c = ^{p* —p^), 

and the statement of the proposition follows since ^{p* — p^) is a constant independent of e and 
n. □ 

B Matching Pennies 

The game of matching pennies is a two-player zero-sum strategic game given by the payoff matrix 
in Figure [3l Both players can either play Heads (H) or Tails (T). The only Nash equilibrium is 
the strategy profile {^H -\- ^T, -\- ^T) in which both players randomize uniformly over H and 

T. By Theorem [H in the equilibrium for the n-stage repeated game of matching pennies both 


Heads {H) 

Tails (T) 

Figure 3: The payoff matrix of the game of matching pennies, 
players randomize uniformly between playing Heads and Tails at each stage, and the entropy of 
the equilibrium strategy of each player is exactly n. 

We now give a generalization of Lemma 3.1 from Budinich and Fortnow |BFllj . 

Theorem 7. For any e € [0,1], let a be a strategy of the column player of entropy n(l — e) in 
the n-stage repeated game of matching pennies. The row player has a deterministic strategy that 
achieves payoff of at least e against a. 


Heads (H) Tails (T) 


1,-1 

- 1 , 1 

- 1 , 1 

1,-1 
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Proof. Let pa be the strategy of the row player (Rowena) that finds the most likely action of the 
column player (Colin) at each history and plays the best response to that action. For any stage 
t = 1,... ,n, and any terminal history a € R” we denote by the probability of Colin’s most likely 
action at stage t at the subhistory (o^,. 




Consider the following function ip : x ,n + l} 

a € and any t € {1,... , n, n + 1} as: 


defined for any terminal history 


t-i 


P(a,t} = '^u(a') - ff(cTi) , 


2=1 

where G A{x^^^A 2 ) is the distribution of the actions taken by Colin in a at stages t,... ,n given 
the history of the play up to stage t is (a^,..., Note that for t = n + 1, Colin has no more 

actions to take, and by convention we write = 0, so that ip{a, n +1) = u{a^) (i.e. the 

total accumulated utility of Rowena at the terminal history (a^,... ,a"'). Also note that for any 
terminal history o the value of ip{a, 1) is = —H{a), i.e., minus entropy of the distribution 

a of Colin’s play in all the n stages. 

Now consider the expected increase in (p between two consecutive stages when Colin’s actions 
are drawn from cr and Rowena’s actions are chosen according to p, i.e., for every t G {1,... ,n} 
consider 

E [p(,a,t + 1) - p(,a,t)] . 

We expand the above using the definition of p and get 

E 1 + 

a^{Pa,a) J 

Which can be simplified using the probability of the most likely action of Colin at history (ui,..., at-i) 
as 

E [2pi-l + {-H{ai+^) + H{ai))] . 

We can expand the first entropy term 

E [2pi - 1 + (- (p^ Hiailai = ^)) + (1 -p*) • = <|k)) + H{ai))] , 

a-(^{pcr,(T) 

where C* denotes the most likely action of Colin at stage t after history (a^,..., and 4|k denotes 
its alternative. We can rewrite the expression using the definition of conditional entropy to 

E [2pi-l+{-H{ai\,i)+H{ai))] , 


where G A{A 2 ) denotes the distribution of Colin’s action at stage t after the history (a^,..., a 
Because of the chain rule for conditional entropy we get that 

E [p{a,t + l) - pia,t)] = E [2p* - 1 + 

a^^{pa,cr) a<-(ptr,cr) 

> E [2p* -l + (-2p*+2))] 

a-^{pa,a) 




> 1 . 
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Finally, we use the above lower bound on the expected increase of (/? to bound the expectation 
of Rowena when the players play according to the strategy profile 

E [u*{a)]-n= E [ 99 ( 0 , n + 1)] 

a<^(p^,cT) a-^(p,u) 

> E [ 99 ( 0 ,1)] + n • min < E [p{a,t + 1) - Lp{a,t)] 

a-t^(pa,a) te[n] 

> —H{a) + n = —n(l — e) + n = ne . 

Therefore, the expected average payoff of Rowena is at least s. □ 

We give also an alternative and more straightforward proof of Theorem [7] that follows the 
structure of the proof of Theorem [5l 


Proof of Theorem (alternative). Let a be an arbitrary strategy of Colin of Shannon entropy n(l — 
e) for some e G [0,1]. Let be the strategy of Rowena that at each non-terminal history a plays 
the best response to Colin’s strategy a{a). We can express Rowena’s expectation Ea<- {p.,a)[u*{a)] 
as 

-( E [«(a')]+ E [u{a^)\a^] + ■ ■ ■ + E Ka-)|(a\ ..., a"”')] | , 

^ \ a<-(po-,o') a-f—( pctict) a-i—{pa,a) I 

which can be rewritten due to the definition of conditional expectation as a summation over terminal 
histories 


1 

n 


'^{pa,a){b) ■ ( E [M(a^)]+ E [u{a‘^)\a^ 

Va<-(Pa,o-) a<-(pa,o-) 


6 ^] + • • • 


+ E Ka-)|(a\...,a’^-') 

a-t^{pa,cr) 



For every terminal history b = ( 61 ,..., bn), the total entropy of a over the non-terminal subhis¬ 
tories of b is bounded by n(l — e), i.e.. 


n —1 

Hiam) + Y, Hiaibi ,..., h)) < n(l - £) . (2) 

i=l 

We dehne Sq = 1 — and for every i G {1,..., n — 1} we dehne = (1 — H{a{bi ,..., bi))). 

Note that 0 < Sj < 1 for every i G {0,... , n — 1} and from inequality ([2]) we get that £ < ^ YIIZq £i- 
In order to conclude that Rowena’s expected utility in the strategy profile {pcr,cr) is at least e, 
it is sufficient to show that for every subhistory b' of b the expectation 'E[u{pa{b'),a{b'))] is least 
1-H{a{b')). 

For an arbitrary non-terminal history h, consider Rowena’s expectation in G given the strategy 
prohle {p(j{h),a{h)). Since Pa{h) is the best response to Rowena’s expectation is 2p — 1, 

where p is the probability of Colin’s most probable action at history h. We need to show that for 
all p G [1/2,1] 

2p- 1 > 1 - H{a{h)) = l+p\og2{p) + (1 -p)log2(l -p) . 

For p equal 1/2 or 1, the left side and the right side of the inequality are equal. Since 2p — 1 is a 
linear function and 1 -|-plog 2 (p) + (l—p) log 2 (l —p) is a convex function on [1/2,1], the inequality 
holds. This concludes the proof. □ 
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It follows form Theorem [7] that if the players can use only strategies of entropy (1 — e)n (i.e., 
lower than ra-times the entropy of an equilibrium of the single-shot matching pennies) then Nash 
equilibria in the n-stage repeated game of matching pennies do not exist. 

Proposition 2. Let G" be the n-stage repeated game of matching pennies. 

1. For alio < e < 1, if a is an e-Nash equilibrium of then the players’ strategies in a are of 
entropy at least n(l — e). 

2. For all 0 < e < 1, there exists an (e -t- ^)-Nash equilibrium of in which the players’ 
strategies are of entropy at most {1 — e)n. 

Proof. First, we show that any e-Nash equilibrium a in the n-stage repeated game of matching 
pennies comprises of strategies of entropy at least (1 — e)n. Assume that there is an e-Nash 
equilibrium in which both players use a strategy of strictly smaller entropy than (1 — e)n, i.e., of 
entropy (1 — e')n for some e' > e. By Theorem [71 each player i has a strategy cj. that achieves at 
least e' against (T_j. Since a is an e-Nash equilibrium then for any player i 

EK*(ct)] > E[< (cr',cr_i)] - e>e'-e>0. 


This implies that for both players E[n*((T)] > 0, however it cannot be the case that the expectation 
of both players is strictly larger than zero, since matching pennies is a zero-sum game. 

Second, we show that if the players can use strategies of entropy (1 — e)n then there exists an 
(e-1- ^)-Nash equilibrium of the n-stage repeated game of matching pennies. To see this, consider a 
strategy profile in which the players play uniformly at random H and T in the first [(1 —e)nj stages 
and in the remaining \en~\ stages Rowena plays always H and Colin alternates between T and H 
(i.e., the outcome at stage [(1 — e)nj -|-1 is {H, T)). If [en] is odd then Rowena’s expectation is 
and otherwise it is 0. Both Colin and Rowena can improve their expectation only in the last [en] 
stages by matching/countering the opponent, but any such deviation can achieve utility at most 


\£n~\ en -|- 1 1 

n n n 

Hence, both players can improve the utility by at most e + f by deviating from the prescribed 
strategy profile, and it constitutes an (e -|- -)-Nash equilibrium. □ 


B.l Matching Pennies with Compntationally Efficient Players 

In this section we prove the statement of Theorem 0] for the special case of the game of match¬ 
ing pennies without relying on the framework of adaptively changing distributions of Naor and 
Rothblum |NR06| , but using the classical results on pseudorandomness discussed in Section 12.21 
In particular, that if one-way functions do not exist, then the players cannot efficiently generate 
unpredictable sequences of bits using only a few truly random bits. Hence, in the repeated game 
of matching pennies any player can at some stage efficiently predict and exploit the next move of 
an opponent that uses amount of random bits sub-linear in the number of stages. 

Theorem 8. If one-way functions do not exist then for any polynomial-size eircuit family 
implementing a strategy of Colin in the repeated game of matching pennies using at most re — 1 
random bits, there exists a polynomial time strategy of Rowena with expected utility 6{n) against 
Gn for some noticeable funetion 5. 
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Proof. Let be a probability ensemble defined for all n as the random variable over 2n-bit 

strings corresponding to the terminal histories in the n-stage repeated matching pennies (where H 
corresponds to 0 and T to 1) when Rowena plays uniformly at random and Colin plays according 
to Cn- Note that is of length 2n and it can be generated in polynomial time given at most 
2n — 1 random bits, since Cohn’s strategy uses random strings of length at most n — 1. 

Since one-way functions do not exist, the ensemble cannot be pseudorandom. In 

particular, it cannot be unpredictable in polynomial time in the following sense. There exists a 
polynomial time predictor algorithm A that reads x •(— bit by bit and succeeds in predicting the 
next value with probability noticeably larger than one half. Formally, let next^(x) be a function 
that returns the Lth bit of x if on input x) algorithm A reads only the first i — 1 < |x| bits of 
X, and returns a uniformly chosen bit in case A reads the entire string x. There exists a predictor 
algorithm A and some positive polynomial p, such that 

Pr[X(ll^"l,X„) = nexU(X„)] > ^ + —^ , 

2 p[n) 

where the probability is taken over the randomness of A. 

We show that Rowena can guarantee for herself at least noticeable expected utility by emulating 
A on the transcript of the repeated game. Consider the strategy Ra of Rowena that at each stage 
i samples a uniformly random bit r*, and if A{hi) outputs any prediction c* of Cohn’s action then 
Rowena plays c* (to match Cohn) and otherwise it plays r* and uses the action played by Colin at 
stage i as the next input to A. After the stage in which A outputs a prediction Ra plays uniformly 
at random. The expectation of Rowena can be lower bounded in the following way: 

E[u*{Ra,C)] > ^ ^Pr[A outputs c*] ■ ^(n - 1) • 0 + 2 “ 1 

-|- (1 — Pr[A outputs c*]) -0 

Recall that the actions of Rowena are chosen uniformly at random and the predictor A has to guess 
a uniformly random bit if it reads the whole terminal history x <— X„. Hence, in order to gain 
noticeable advantage over one half, A must output its prediction to one of the actions of Colin with 
at least noticeable probability, i.e., Pr[A outputs c*] is at least S'(n) for some noticeable function S'. 
Thus, the strategy Ra achieves expectation at least S(n) = S'(n) ■ (n ■ p(n))~^, which is a noticeable 
function of n. □ 
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